An algorithm with linear expected running time for string editing with substitutions and substring reversals
نویسنده
چکیده
The edit distance between given two strings X and Y is the minimum number of edit operations that transform X into Y without performing multiple operations in the same position. Ordinarily, string editing is based on character insert, delete, and substitute operations. Motivated from the facts that substring reversals are observed in genomic sequences, and it is not always possible to transform a given sequence X into a given sequence Y by reversals alone (e.g. X is all 0’s, and Y is all 1s), Muthukrishnan and Sahinalp [7, 8] considered a “simple” well-defined edit distance model in which the edit operations are: replace a character, and reverse and replace a substring. A substring of X can only be reversed if the reversal results in a match in the same position in Y . The cost of each character replacement and substring reversal is 1. The distance in this case is defined only when |X | = |Y | = n. There is an algorithm for computing the distance in this model with worst-case time complexity O(n log n) [8]. We present a dynamic programming algorithm with worst-case time complexity O(n) but its expected running-time is O(n). In our dynamic programming solution the weights of edit operations can vary for different substitutions, and the costs of reversals can be a function of the reversal-length.
منابع مشابه
Approximating Reversal Distance for Strings with Bounded Number of Duplicates
For a string A = a1 . . . an, a reversal ρ(i, j), 1 ≤ i < j ≤ n, transforms the string A into a string A′ = a1 . . . ai−1ajaj−1 . . . aiaj+1 . . . an, that is, the reversal ρ(i, j) reverses the order of symbols in the substring ai . . . aj of A. In a case of signed strings, where each symbol is given a sign + or −, the reversal operation also flips the sign of each symbol in the reversed substr...
متن کاملSorting by reversals
One of the most studied problems in the field of computational biology is the string matching problem. Much of the reasearch has focused on developing efficient algorithms for transforming one string into another by minimizing the number of steps. The biological motivation for this problem comes from the fact that DNA sequences get transformed by a series of basic operations such as mutations, ...
متن کاملانتخاب کوچکترین ابر رشته در DNA با استفاده از الگوریتم ازدحام ذرّات
A DNA string can be supposed a very long string on alphabet with 4 letters. Numerous scientists attempt in decoding of this string. since this string is very long , a shorter section of it that have overlapping on each other will be decoded .There is no information for the right position of these sections on main DNA string. It seems that the shortest string (substring of the main DNA string) i...
متن کاملFast String Matching using an n -gram Algorithm
Recently an algorithm was developed for the substring search problem with expected running time provably fast when the text string was drawn from a stationary ergodic source. The theory of ergodic sources was developed by Shannon and others as a theoretical model of natural language. The model is based on the assumption that the probabilities of substrings are invariant to shifts of position to...
متن کاملFast string matching by using probabilities: On an optimal mismatch variant of Horspool's algorithm
The string matching problem, i.e. the task of finding all occurrences of one string as a substring of another one, is a fundamental problem in computer science. Recently, this problem received a great deal of attention due to numerous applications in computational biology. In this paper we present a modified version of Horspool’s string matching algorithm using the probabilities of the differen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Lett.
دوره 106 شماره
صفحات -
تاریخ انتشار 2008